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SOLUTION SETS FOR EQUATIONS OVER FREE GROUPS ARE 

EDTOL LANGUAGES 

LAURA CIOBANU, VOLKER DIEKERT, AND MURRAY ELDER 


Abstract. We show that, given an equation over a finitely generated free 
group, the set of all solutions in reduced words forms an effectively con- 
structible EDTOL language. In particular, the set of all solutions in reduced 
words is an indexed language in the sense of Aho. The language characteri¬ 
zation we give, as well as further questions about the existence or finiteness 
of solutions, follow from our explicit construction of a finite directed graph 
which encodes all the solutions. Our result incorporates the recently invented 
recompression technique of Jez, and a new way to integrate solutions of linear 
Diophantine equations into the process. 

As a byproduct of our techniques, we improve the complexity from qua¬ 
dratic nondeterministic space in previous works to NSPACE(n log n) here. 


Introduction. In this paper we prove that the set of all solutions, as reduced 
words, to an equation in a finitely generated free group or free monoid with in¬ 
volution, has a description as an EDTOL language. Furthermore, we show that 
this description can be computed in NSPACE(nlogn), where n is the length of the 
equation plus the number of generators of the group or monoid. 

We construct a finite graph, of singly exponential size with nodes 

labeled by equations of bounded size plus some additional data, and directed edges 
corresponding to transformations applied to the equations. More precisely, the 
edges are labeled by endomorphisms of a free monoid C*, where G is a finite 
alphabet which includes the group or monoid generators. The graph, viewed as a 
nondeterministic finite automaton, produces a rational language of endomorphisms 
of C*. We show that the set of all such endomorphisms applied to a particular 
‘seed’ word gives the full set of solutions to the input equation as reduced words. 
Thus, by the definition of Asveld [2], we obtain that the solution set is an EDTOL 
language, and therefore an indexed language. Moreover, one can decide if there 
are zero, infinitely or finitely many solutions simply by checking if the graph is 
empty, has directed cycles or not. Our complexity results concerning these decision 
problems are the best known so far; and with respect to space complexity they 
might be optimal. 

The first algorithmic description of all solutions to a given equation over a free 
group is due to Razborov [101 [n]. His description became known as a Makanin- 
Razborov diagram^ and this concept plays a major role in the positive solution 
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of Tarski’s conjectures about the elementary theory in free groups [mm]. While 
Makanin-Razborov diagrams are also graphs whose edges are labeled by morphisms, 
these morphisms are group homomorphisms, and it is unfeasible to use this ap¬ 
proach to directly obtain solutions in freely reduced words, as the cancellation 
within group elements after applying a homomorphism cannot be controlled. Also, 
it is extremely complicated to explicitly produce a Makanin-Razborov diagram for 
a given equation, and this has been done only in very few cases (E5|)- 

A description of solution sets as EDTOL languages was known before only for 
quadratic word equations over a free monoid by m; the recent paper [5] did not 
aim at giving such a structural result. The present paper builds on the techniques 
in |5|, in particular we make use of Jez’s recompression method m- There is also 
a description of all solutions for a word equation over free monoids by Plandowski 
in [19j . His description is given by some graph which can be computed in singly 
exponential time, but without the aim to give any formal language characterization. 

In this paper we restrict ourselves to equations in free groups or free monoids with 
involution, and their solution sets in reduced words. It is possible to generalize our 
construction in several directions. First, we can replace the free group by any finitely 
generated free product P = *i<i<sFi where each Fi is either a free or finite group, 
or a free monoid with arbitrary involutions. Second, we can allow arbitrary rational 
constraints for free products. We consider Boolean formulae $, where each atomic 
formula is either an equation or a rational constraint, written as A & L, where TCP 
is a rational subset. More concretely, let P be a free product as above, d) a Boolean 
formula over equations and rational constraints, and {Ai,-- - ,Xk\ any subset of 
variables. Then the techniques developed in this paper allow us to prove that 
Sol($) = {cr(Ai)# • ■ • #cr(Afe) I cr solves $ in reduced words} is EDTOL. More¬ 
over, there is an algorithm which takes $ as input and produces an NFA A such 
that Sol($) = {(/?(#) I C L{A)}. The algorithm is nondeterministic and uses 
quasi-linear space in the input size of d>. However, these more technical results 
are not the scope of the present paper. They follow from standard results in the 
literature and they have been announced in the conference version of this paper 
which was presented at ICALP 2015, Kyoto (Japan), July 4 - 10, 2015 [3]. Full 
proofs are in the corresponding paper on arXiv |3] . 

Article organisation. In Section[l]we give preliminary dehnitions and notations. 
In Section|3]we state the main result, TheoremjTl that solutions in reduced words to 
equations in either a free group or a free monoid with involution are described by a 
finite graph or nondeterministic finite automaton (NFA) which can be constructed 
in nondeterministic quasi-linear space. The main work of the paper is in Section |3| 
which treats the monoid case. We define the NFA in subsection l3.6l and present the 
proofs that the NFA encodes only correct solutions (soundness), and all solutions 
(completeness), in subsections 13.91 and 13.101 respectively. The most complicated 
part is the completeness proof, which involves producing a path for a given solution 
from initial to final node by alternatively expanding and compressing the equation, 
ensuring that at all times the size of the equation is bounded so that we stay within 
the graph. 

Once the monoid case is proved, in Sectionjjjwe follow relatively standard meth¬ 
ods to reduce the problem of finding solutions in reduced words in a free group to 
the monoid case. In the final section we give an explicit example of the alternating 
expansion-compression procedure. 
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We stress that the complicated part of the paper is to prove that the NFA we 
construct encodes exactly all solutions; the specification and construction of the 
NFA, and hence the EDTOL language description, is extremely simple by contrast. 

1. Preliminaries 

1.1. Monoids with involution. An alphabet is a finite set whose elements are 
called letters. By F* we denote the free monoid over the finite set F. The elements 
of a free monoid are called words, and the empty word is denoted by 1. The length 
of a word w is denoted by |?n|, and \w\,^ counts how often a symbol x appears in w. 
Let M be any monoid and u,v G M. We write u < u if u is a factor of v, which 
means we can write v = xuy for some x,y G M. We denote the neutral element 
in M by 1, and use the notation idc* for the neutral element in the monoid of 
endomorphisms over a free monoid C*. 

An involution on a set F is a mapping x i-^x such that x = x for all x € F. For 
example, the identity map is an involution. An involution on a monoid must also 
satisfy xy = yx. Any involution on a set F extends to F*: for a word w = ai - ■ ■ Um 
we let w = am ■ ■ ■ oT; then F* endowed with the involution is called a free monoid 
with involution. If o = a for all a € F then w is simply the word w read from 
right-to-left. 

A morphism between sets with involution is a mapping respecting the involution, 
and a morphism between monoids with involution is a homomorphism ip : M ^ N 
such that p(x) = p{x). A morphism is a A-morphism if p{x) = x for all a: S A 
where A C M. In this paper, whenever the term “morphism” is used, it refers to 
a mapping which respects the underlying structure, including the involution. All 
groups are monoids with involution given by x = a;“^; and all group homomor- 
phisms are morphisms. 

1.2. Free partially commutative monoids. Let A be a finite set with invo¬ 

lution. An independence relation is an irreflexive relation 0 C A x A such that 
{x,y) G 0 {x,y) G 9. Every independence relation defines a free partially 

commutative monoid with involution M{A,6) by 

M (A, 9) = A*/ {xy = yx \ {x, y) G 9} . 

These monoids are well-studied in computer science as they form the basic algebraic 
model for concurrency, see [3 HI ESI- In mathematics free partially commutative 
groups are commonly referred to as right-angled Artin groups (RAAGs). Their 
study has a long history with strong connections to topology and geometric group 
theory, see for example [26] . 

In this paper we will need algorithms for equality and factor testing in free 
partially commutative monoids. This can be done very efficiently: for example, 
there is a linear time algorithm (|16jl to decide on input u,w G A* whether u < w 
in M{A,9). Here we need the uniform version, as follows: the input is a tuple 
(A, 9, u, w) with u,w G A*, and the question is whether m is a factor of u> in M(A, 0). 
This problem can easily be solved in nondeterministic linear space (which suffices 
for our purposes) by the following argument: first find words p,q G A* by scanning 
w from left to right and for each position guessing (nondeterministically) whether 
each corresponding letter belongs to p, u or q, requiring that \puq\ = |i(;| (we do this 
by marking each letter of the input, which requires linear space). Second, check 
that the choice of positions assigned to u produces a word that is indeed equal to 
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u. Third, check whether puq is equal to w in M{A,9). For both the second and 
third steps we use the “projection lemma” of dSlE]: for example, in the third step 
we check that \puq\^ = for all a S A, then we check that the projections olpuq 
and w to {a, b}* yield identical words for all a,b € A such that ab ^ ba in M(A, 9). 
The projections are obtained by ignoring all letters in puq and w which are not in 
{a, 6}. 

Another fact about partially commutative monoids that we use later is that for 
u G M{A,9) the values |u| and |u|^ are well-defined since = |ya;|^ for all 

x,y G A*, a G A. 

We will define free partially commutative monoids through “types” in Subsec¬ 
tion [3131 which for simplicity of notation are also denoted by 9. 

1.3. Languages. Languages refer traditionally to subsets of finitely generated free 
monoids; the class of regular languages can be defined via rational expressions, 
nondeterministic finite automata, or recognizability via homomorphisms to finite 
monoids, to mention just a few of the possible definitions |18j . These notions 
generalize to arbitrary monoids, but lead to different classes, in general. 

We define a rational subset in any monoid M by means of nondeterministic 
finite automaton, NFA for short. An NFA is a directed finite graph A with 
initial and final states, where the transitions between states are labeled by ele¬ 
ments of the monoid M. We say that to G M is accepted by the automaton A 
if there exists a path from some initial to some final state such that multiply¬ 
ing the edge labels together in M yields to. This defines the accepted language 
L{A) = {to G M \ m is accepted by A}. Then L C M is rational if and only if L 
is accepted by some NFA over M (see 0). An NFA is called trim if every state is 
on some path from an initial to a final state. For a trim NFA A we have L{A) 0 
if and only if A ^ 0. 

We say that L C M is recognizable if there is a homomorphism i/ : M ^ N to 
a finite monoid N such that L = i/“^(u(L)). The family of recognizable subsets 
is closed under finite union and complementation (and therefore also under finite 
intersection), and therefore forms a Boolean algebra. For finitely generated free 
monoids Kleene’s Theorem asserts that a subset is recognizable if and only if it is 
rational; and in this context a rational subset is also called regular. 

In this paper we are mainly interested in rational subsets of free groups A(A+), 
free monoids A*, and monoids End(C'*) of endomorphisms over a free monoid C*. 
If |C| > 2, then End(C'*) is neither free nor finitely generated and it contains 
non-trivial finite subgroups. 

Suppose we have an NFA where each transition label is an endomorphism in 
End(C*) which is applied in the opposite direction of the transition. If a path 
is labelled by the sequence hi,... ,ht, then we can apply the endomorphism h = 
hi ■ ■ ■ ht to an element u G C* and the result is a word h(u) = hi ■■ ■ ht{u) G C*. 
Thus, {h{u) I h G L{A)} defines a language in C*. This leads to the notion of 
EDTOL, defined next. 

1.3.1. EDTOL Languages. The acronym EDTOL refers to Extended, deterministic, 
Table, 0 interaction, and Tindenmayer. There is a vast literature on Lindenmayer 
systems, see with various acronyms such as DOL, DTOL, ETOL, HDTOL and 
so forth. For more background on Lindenmayer systems we refer to |23] . The 
subclass EDTOL is equal to HDTOL (see for example [23l Thm. 2.6]), and has 
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received particular attention. It is a subclass of indexed languages in the sense 
of Aho [T], see for example [5]. Indexed languages are context-sensitive, and they 
strictly contain all context-free languages. The classes of EDTOL and context-free 
languages are incomparable [S] and therefore the inclusion of EDTOL into indexed 
languages is proper. 

regular-EDTOL-ETOL-indexed-context-sensitive 


context-free 

Figure 1. Containments of formal language classes. Each edge 
from left to right represents strict containment. 

We define EDTOL languages in A* through a characterization (using rational 
control) due to Asveld [2], which is the analogue of Ginsburg and Rozenberg’s 
result for ETOL languages m Lem. 4.1]). We start with some alphabet C such 
that ACC, and a rational set of endomorphisms TZ C End(C*). Note that if 
TZ C End(C*) is any subset of endomorphisms, then we can apply TZ to any word 
u G C* and we obtain a subset {h{u) \ h C TZ} C C*. 

Definition 1. Let A be an alphabet and L C A* . We say that L is an EDTOL 
language if there is an alphabet C with A C C, a rational set of endomorphisms 
TZ C End(C*), and a letter c C C such that L = {h{c) \ h G TZ} . 

The set TZ is called the rational control, and C the extended alphabet. 

Note that for an arbitrary set TZ of endomorphisms of C* we have {h{c) \ h G TZ} C 
C*, but the definition implies that TZ must guarantee h(c) G A* for all h gTZ. 

Example 2. Let A = {a, 5} and C = {a, 6,#}. Consider four endomorphisms 
f,ga,gb,h defined as /(#) = ##, ga(#) = a#, 5h(#) = 6#, and h(#) = 1, 
and on all other letters f,ga,gb,h behave like the identity. Consider the rational 
language TZ = h{ga,gb} f (where endomorphisms are applied right-to-left). A 
simple inspection shows that {</?(#) | T G I ^ A*}, which is not context- 

free. 

1.4. Complexity. We use the standard O-notation for functions from N to K>o- 
A function / is called quasi-linear if /(n) G C>(n log n). We say that / is singly 
exponential if f{n) G where p(n) is a polynomial. We also use the standard 

meaning of complexity classes like NP, NSPACE(/), DSPACE(/) and DTIME(/) as 
in [I3. 

Let C and V two domains and for each a; S C U H we let ix) G {0,1}* denote 
some binary encoding. We assume that for every x G C its input size is defined 
as a natural number which might be different from the binary length of {x). For 
example, in our case we define the input size of an equation over a free group or 
monoid to be the length of the equation plus the number of generators of the group 
or monoid. As usual, we omit details on the specific encoding and how to check 
that a binary string y is of the form y = (x) for some x G C. In our case, we content 
ourselves that the encoding of a word of length n over some alphabet T uses at 
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most O(nlog|r|) bits and that the check y = {x) can be done deterministically in 
linear space with respect to the binary length of y. 

A function t : C —>■ 2? is computable in NSPACE(/) if there is a nondeterministic 
Turing machine M with a two-way read-only input tape, a work tape, and a write- 
only output tape. The input a: S C is given as the binary string (x). During the 
computation the machine writes some binary string on the output tape from left 
to right such that for the entire computation the size of M’s work tape is bounded 
by 0(f{n)) where n is the input size of x. There must be at least one run of the 
machine where M stops and if M stops, then output must be the correct value 
{f{x)). We rely on a result by Immerman and Szelepcsenyi which implies that 
NSPACE(/) is (effectively) closed under complementation for functions / satisfying 
logn € 0{f{n)) [Ul Theorem 7.6]). As a consequence, “trimming” an automa¬ 
ton will become possible in NSPACE(nlogn) in Subsection 13.81 Recall that every 
NSPACE(n logn)-computable function can also be simulated by some deterministic 
algorithm in time (see [El Theorem 3.3]). 

1.5. Word equations over monoids with rational constraints. Let A be an 

alphabet of constants with involution and let tt : A* —> M be a surjective morphism 
onto a monoid with involution M. Furthermore, let A be a set of variables. We 
may assume that X is endowed with an involution without fixed points. Thus, 
X ior all X gX. 

Definition 3. A word equation with rational constraint over M is a pair (U, V) of 
words [7, R € (A U X)* which has the following attributes. 

• The input size of the equation is defined as |Aj -|- \ UV\. 

• The rational constraint is given by a homomorphism u : (A U A)* —>■ N, 
where A is a finite monoid. 

• A solution of the equation ({7, V) with constraint v is given by a map 

cr : A ^ A* 

which extends to a homomorphism a : (A U A)* ^ A* that fixes the 
constants, such that for all A S A: 

(1) cr{X) = a{X), i.e. ct : A ^ A* is a morphism, 

(2) u(A) = na{X), i.e. the solution respects the constraint on X, 

(3) Tra{U) = TTa{V), i.e. cr{U) and cr{V) are equal in the monoid M. 

Note that we constrain the solutions to be in a recognizable set (see the definitions 
in Subsection ll.3|) . but in this case the notions of recognisable and rational sets are 
the same, since we are in the free monoid (A U A)*. 

2. Solution sets for equations over free monoids with involution 

AND free GROUPS: THE MAIN RESULTS 

Let A± = A+ U {o j a S A+} be a finite alphabet with involution and assume 
that the involution is without fixed points: a ^ a for all a € A±. We let F(A+) 
be the free group over A+ and we realize the involution inside F(A+) by a = o“^. 
Thus 

A± = A+ U {a-i I a e A+} C F(A+) C A^. 

Following standard terminology, a word w S A^ is reduced if it does not contain 
any factor aa where a G A±. The set of reduced words is a regular subset F C Aj. 
which is closed under involution. We fix F as a set of normal forms for F(A+); 
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thus, as a set, we identify F(yl+) with F. The inclusion A± C F(yl+) induces 
the canonical projection tt : A'^ —>■ F(A+). Given a word w we obtain tt(w) by a 
repeated cancellation of all factors aa; and w is reduced if and only if Tr(w) = w. 

We shall also use a special symbol # which is not in A± and serves as “marker”. 
For example, we will encode a system of equations {{Ui, Vi) \ 1 < 7 < s} as a single 
equation 

( 1 ) 

If we require that no cr(X) is allowed to use where X is a variable, then 
(2) Vi : TTa{U,) = Tra{V,) 7rcr(C/i# • • • #17*) = 7r(T(Fi# • • • #K) 

since positions of the ^ letters must be the same on both sides. In our context, 
rational constraints are the most convenient way to ensure that no 7 ^ appears in 
cr(7f), see Subsection 13.21 We let 

A = A±U {#} 

with # = #. Thus, {I,#} forms a group which is isomorphic to Z/2Z if we let 
= #• 

In order to have a uniform statement we let M(A) be either the free monoid with 
involution A* or the free product of the free group F(7l-|_) with the cyclic group 
{I,#} of order 2. Thus, henceforth: 

M(>l) = yl* or M(>l)=yl7{aa=l| oGA}, 

and TT : A* —>■ M(A) is the canonical projection induced by the inclusion A C M(g 1). 
In both cases tt is injective on F C A*, and if M(g 1) = A*, then tt is just the identity. 

Given a word equation (t/, V) with UV G {A± U A)* over M(A), we say that a 
solution (T is a solution in reduced words if cr(X) G F for all X G X. We will realize 
this condition as a rational constraint /r into a finite monoid N with a zero element 
0 G such that ^(w) 0 if and only if re G F. 

Theorem 4. Let {U,V) be an equation over M(A) of input size n = \A\-\- \UV\ 
(according to Definition\^ and in variables Xi, Xi ,..., Xm, X^- Then there is an 
NSPACE(nlogn) algorithm which computes ci,...,Cm G C, where C A A is an 
extended alphabet of size \C\ G 0{n), and a trim NFA A which produces the set of 
solutions in reduced words. That is, 

{(cr(Xi), ..., a{Xm)) G F X • • • X F I 7 rcr(C/) = iicriy)} 

= {(/i(ci),..., h{c^)) gC* x---xC*\hG L{A)}. 

The NFA has the following properties. 

(1) It is nonempty if and only if the equation {U,V) has some solution. 

(2) It has a directed cycle if and only if (U,V) has infinitely many solutions. 
These properties can also be decided in NSPACE(nlogn). 

Recall that the input size n used in the statement of the theorem might be 
smaller than the length of some binary encoding for the input. If the number of 
distinct symbols used in the equation is constant, then our algorithm is quasilinear 
in the input size; if, on the other hand, the number of distinct symbols used in the 
equation is linear, then we need linear space, only. 

Theorem 0] yields the characterization of solutions sets as EDTOL languages. 
To do so, we identify a tuple of words {wi,... ,Wk) G F with the single word 
wi# ■ • • G A*. 
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Let {U,V) be an equation as in Theorem SI For any subset {Zi,..., Zk} of 
variables appearing in UV we define the solution set as 

(4) Sol2:(?7, V) = {a(Zi)^ ■ ■ ■ ^a{Zk) \ <J solves {U, V) in reduced words} . 

Note that for fc = 0 we have Sol0({7, V) = % ii the equation {U, V) has no solution 
and Sol0(?7, y) = {1} otherwise. Considering subsets of variables allows for some 
flexibility. In particular, we can introduce auxiliary variables which do not impact 
the solution set. If, however, every variable occurring in UV is either of the form 
Zi or Zi for some I < z < /c, then we say that SolziU, V) is a full solution set. 

Corollary 5. Let (U, V) be an equation as in Theorem^^ and let {Zi,..., Zk} be 
any subset of variables appearing in UV. Then Sol^:(C/, y) is an EDTOL language. 
More precisely, if A is the trim NFA constructed in Theorem then we can find 
c},..., c} G C such that 

SolziU, y) = {/z(c}# • • ■ #c'k) I h G LiA)} . 

In particular, the full solution set is EDTOL. 

Proof. The language characterization follows from the Definition [T] of an EDTOL 
language, given that each Zj corresponds to some Xi in Theorem 31 □ 

Note that Theorem 3] shifts the traditional perspective from solving an equation 
to an effective construction of some NFA producing an EDTOL set. Once the NFA 
is constructed, the existence of a solution, or whether the number of solutions 
in reduced words is zero, finite or infinite, become graph properties of the NFA. 
Thus, the algorithmic difficulty of solving equations and describing their solution 
set reduces to the complexity of building a nondeterministic finite automaton for a 
given input. 


3. Proof of Theorem 31 in the monoid case: M(A) = A* 

In this section we prove Theorem 31 in the monoid case. Before delving into 
the proof, we introduce in Subsections I3.IH3.7I further necessary terminology and 
notation. 

Let M(A) = A*. In this case tt = id^* and so tt is not needed in the rest of this 
section. Without restriction, we may assume |A+| > 1. 

Let Ainit = {Ai, Ai,..., Xm, Xm} be the initial set of variables, that is, for each 
1 < z < m either Xi or Xi occur in UV. 

Let K G 0(1) be some “large enough” constant, whose exact value will be dis¬ 
cussed in Subsection l3. 10.41 and choose an alphabet C of constants and an alphabet 
n of variables such that 


C A A,\C\ = K ■ n and Ll D Alnit , |fl| = 6zz. 

Fix F = C U D. We assume that C and Q. are sets with involution and that, inside 
F = (71111, the marker ff is the only self-involuting symbol. Thus, ff = ff and 
X a: for all a; G F \ {#}. 

By S we denote the set of (7-morphisms cr : F* —>■ (7*. Every solution will be 
drawn from E. 
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3.1. The initial word equation ITmit- For technical reasons we need that for 
every variable Xi which appears in UV there is some factor appearing in 

the initial equation. Instead of viewing equations as equalities between two words 
U and V, we will treat equations as a statement about a single word W G F*, as 
follows. This will require us to redefine the notion of solution as well. 

We define the initial equation W-mit G (T U Tinit)* as: 

(5) TTinit = #^ 1 # • • • ■ ■ ■ ifxl#. 

Then for every ct G S we have 

a{u) = a{v) Cr(Winit) = Cr(Winit) 

and 


{((t(Xi), .. .,a{Xm)) GFx---xF|(tGSA a{U) = a{V)} 

= {((t(Xi), . . . , a{Xm)) GFx---xF|crGSA Cr(lTinit) = Cr(lTinit)}- 

We have the following symmetry: if ru < Winit is a factor and no # appears in 
w, then w < Winit, too. The number of # letters in Winit is odd, and there is a 
distinguished # exactly in the middle of Wnit- 

Observe that Wnit is longer than UV, but clearly linear in n. More concretely, 
since m < \UV\ and n = \UV\ + |Al| > \UV\ + 1, we get the bound: 

(6) iWnitl < 4 to + 5 + 2 • \UV\ < 6 • \UV\ + 5 < &{\UV\ + 1) < 6n. 

Also observe that + 2 \UV\ < 4n. 


3.2. The finite monoid Ajr. In order to ensure that solutions are in reduced 
words which do not contain the symbol we introduce a morphism to a fixed 
finite monoid which plays the role of (a specihc) rational constraint. We define 
Nf as follows: Ap = {1,0}U(A± x A±) with multiplication given by 1-a; = a; -1 = x, 
0 • a; = a; ■ 0 = 0, and 


(a, 5) • (c, d) 


(a, d) Mb ^ c 

0 otherwise. 


The monoid has a natural involution given by 1 = 1, 0 = 0, and (a, 5) = (6, a). 

The morphisms to Af are defined on subsets of F, and although they change 
during the algorithm, they always extend the following fixed morphism 

Mo • A* —>■ Af 

which is defined by 

Mo(#)=0, Mo(a) = (a,a) 

for a G A±. It is clear that mo respects the involution and Hoiw) = 0 if and only 
if either w contains ^ or w is not reduced. If, on the other hand, 1 ^ w € is 
reduced, then /io('R') = where a is the first and b the last letter of w. An 

additional feature is that iJ,{w) = 1 if and only if w is the empty word. 

Defining fJ,{X) for a variable X has the following meaning for a solution cr with 
(t{X) G Aj.: the value IJ.{X) = 0 is not possible in any solution, fi{X) = 1 implies 
<j{X) = 1, and ^{X) = (a, 6) o-(A) G F fl oF n ¥b. 
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3.3. Types. Later in the proof we will need to perform compression of large blocks 
of letters in an efficient manner. This will be achieved by putting a partially 
commutative structure on the monoid we work with. The partial commutativity 
will be induced by types, which we introduce below. The basic idea is that we assign 
a variable X the “type” 9{X) = c when we predict that in some solution a{X) G c* 
(so X and c commute), and we assign a constant b the “type” 9{b) = c when we 
rename some letters b as c. 

Besides the initial alphabet A and the global alphabet C, we also need a current 
alphabet of constants B, where A C B = B C C, and a current set of variables 
T = T C n. Let A = B U X. A type is a partially defined function 9 : {A\ 
A) — > {B \ A) which respects the involution. We identify 9 with the relation 
{{9{x),x) G A X A I 9{x) is defined}. We obtain an independence relation 

9 = {{9{x), a;) G A X A I 9{x) is defined for x} 

and hence a free partially commutative monoid 

M{A, 9) = A*/ {x9{x) = 9{x)x \ 9{x) is defined for x} . 

If the domain where 9 is defined is empty, then M{A,9) = M(A,0) is the free 
monoid A*. 

Remark 6. By definition, the size |d| is bounded by |A|. Hence, it is linear in n 
and the specification of 9 needs O(nlogn) bits. 

Definition 7. Let B satisfy ACB = BCC,X = X^n, and 0 be a type. The 
notation 

M{B,X,9,n) 

denotes the free partially commutative monoid with involution M{B\JX, 9), equipped 
with a morphism : M{B\JX, 9) ^ Nr such that p,{a) = fio{a) for all a € A, where 
fiQ : A* ^ Nr is the morphism specified in Subsection [321 We call AI{B, X, 9, p) a 
structured monoid. 

A morphism ip from M{B, X, 9, p) to M{B', X', 9', pi) is a morphism of monoids 
with involution p : M{B, X, 9, p) —>■ M{B', X', 9', p') such that p'p = p. 

Definition [7] implies that whenever 9{x) is defined, then p{x9{x)) = p{9{x)x) 
(because p is a homomorphism). Henceforth we use the following conventions. 
It B' C B and X' A X with A Q B' = W and X' = , then M{B',X',9,p) 

denotes the structured monoid M{B', X',9', p') where 9' and p' are induced by 
the restrictions of 9 and p to B' U X' . Moreover, if M{B,X,9,p) is known from 
the context, then we abbreviate M{B,(ll,9, p) as AI{B). Since no letter from A is 
involved in a type, M (A) is the free monoid with involution A* together with the 
morphism pq : A* ^ Nr, and 

M(A) = M(A, 0,0, ^o) C M{B) C M{B, A, 9, p) -^4 Nr- 

3.4. Reference list of symbols. In Table [T] we summarise notations introduced 
so far for easy reference. These conventions hold unless stated otherwise. They 
also apply to “primed” symbols such as B' , where B' denotes a set with A C B' = 
W CC. 
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A+ C A±, the initial alphabets without self-involuting letters. 

A± U {#} = A C B = B C C7. 

F = C U n and x = x GT implies a: = #. 

A = A C fl, the current set of variables. 

n=\A\ + BA , C = Kn and fl = 6n. 

A = BU A. 

^ : A Ap, a morphism with /i(a) = ^J.o{a) for a G A. 

9 : (A \ A) —>■ (B \ A), the type defining an independence relation. 

M(A, 9), free partially commutative monoid defined by A and 9. 

M{B, A, 0, /r) = M{A, 9) together with ^ which extends : A* ^ Ap. 

M{B), submonoid of M{B, A, 0, /i) together with the restriction of 0, /r. 

a,b,c,... refer to letters in C. 

u,v,w,... refer to words in C*. 

X,Y, Z,... refer to variables in fl. 

x,y,z,... refer to words in F*. 


Table 1. Reference list of symbols. 


3.5. Extended equations and their solutions. The states of the NFA we are 
going to construct correspond to equations derived from our initial equation. Each 
state contains such an equation, together with the specihcation of which set of 
constants, variables and types are used. Moreover, we keep track of the morphism /i 
which represents the constraint. Formally, we use the notion of extended equation. 
The notions we introduce now are quite technical, but the reader should keep 
in mind that the most important fact is that an extended equation contains an 
equation which is a modihcation of the initial equation, and this equation has 
bounded length. When types are present, this equation is an element in a free 
partially commutative monoid rather than simply a word in a free monoid. 

Definition 8. An extended equation is a tuple (IT, B, A, 0,/i), where VF is a word 
in (B U A)* such that: 

(1) |1A| < 204n. 

(2) If 0 = 0, then J2xgx \^\x — 4^- Otherwise J2xgx \^\x — 

(3) IFFI^ = iVFinitI# and IT G #(B U A)*#. 

(4) Every x with ^ ^ x G B U X satisfies ^j,{x) ^ 0. 

(5) Every X G X appears in W. 

(6) If a: < W is a factor with \x\^ = 0, then x < W, too. 

Remark 9. As noted above, the word W (including the notion of factor) is 
to be seen as representing an element in the free partially commutative monoid 
M(B,A,0,/i) = M(BUX, d). Note that by definition |0| < |B U A| (see RemarklHl). 
The bounds on the length of W, and on the number of variables appearing in W, 
will be explained in later sections lSubsection l3.10.^ . where we will show that we 
can hnd all solutions to an input equation by considering modified equations that 
satisfy these restrictions. What is important for now is that |IA| G 0{n) which 
means the number of extended equations is finite. 
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Definition 10. Let V = (W, B, Af, 0, fj,) be an extended equation. The weight ||1L|| 
of fL is a 4-tuple of natural numbers, ||y|| = (wi,a; 2 , W 3 , 074 ), where 

wi = |W^|, 

^2 = \W\-\{aGB\ |1TL>1}|, 

CO3 = \W\-\0\, 

UJ4 = \B\. 

Remark 11. We order tuples in lexicographically. The lexicographic ordering 
is chosen to function as follows. If we start at an equation of high weight, then the 
weight of the equation reduces by “compression”. The first component gives more 
weight to longer equations. If two equations have the same length, then we declare 
the equation in which more distinct constants appear to be smaller because the 
term |{a G R | |IT|^ > 1}| appears with a negative sign. If two equations have the 
same length and use the same number of distinct constants, we declare the equation 
in which more symbols are typed to be smaller. Finally, if both equations have the 
same length, the same number of distinct letters in use, and the same number of 
typed symbols, then we declare the equation defined over the smaller set B to be 
smaller. 

Since for every extended equation we have a current alphabet B, we need the 
notion of a R-solution, which can then be extended to a solution over the desired 
alphabet A. The next few pages are somewhat technical, but will be used to justify 
that when we modify extended equations in certain ways, solutions are preserved. 

Definition 12. Let V = {W,B,X,0,fx) be an extended equation. 

• A B-solution at is a R-morphism <j : M{B,X,0,fi) —>• M{B,9,0, fj,) such 
that a{W) = a{W) and a{X) G y* whenever {X,y) G 0. 

• A solution at 14 is a pair {a, a) where ct is a B-solution and a : M (B, 0, 0, /i) —>■ 
A* is an A-morphism (which implies /i = Hoct)- Moreover, if the set X in 
V is nonempty, then we require that a is nonerasing, that is, a{a) ^ 1 for 
all a G B. 

The weights ||q;, cr|| and ||a, cr, Mjl of a solution {a, a) at V are defined as 


(7) 

||q;,(t|| = |q;(t(A)| G N 


agy 

(8) 

||a,a,y|| = (||a,a||,||M||)GN5 


Remark 13. Let V = {W, B, X,0, n) be an extended equation with a solution 
(a, cr). Then (j(A) cannot have any factor of the form ^ or aa with a € B because 
0 ^ y(X) = fj,oaa{X). In particular, aa{X) is a reduced word in Aj_. Hence, 
aa satisfies the constraint aa{X) G F. Note that a priori we don’t exclude the 
possibility that factors aa appear in W, since for example it could be that Winit 
contains a factor aX and some solution a{X) begins with a. 

The next two lemmas show how morphisms between structured monoids trans¬ 
form solutions of extended equations. These two lemmas will play an important 
role in the proof of the algorithm “soundness”. 
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In the first lemma we consider the morphisms which leave all constants invariant, 
and conclude that such a morphism decreases the weight of a solution. In addition, 
this lemma specifies a situation, in part (iv), when the weight strictly decreases. 

Lemma 14. Let V = {W, B, X, 9, and V = {W, B, X', 6', /i') he extended equa¬ 
tions such that 9{a) = 9'(a} and ii{a) = for all a G B. In other words, 

M{B) = M{B, 0,0, ^l) = M{B, 0,0', ^')- 

Let T : M{B, X, 0, /i) —>■ M{B, X', 0', /i') he a B-morphism such that W = t{W) 
and a : M{B) —> M{A, 0, 0, /tq) he an A-morphism such that a{a) ^ 1 for all a G B. 

Given a B-solution o' at V, define a B-morphism a : M{B,X,9,fi) —> M{B) 
hy (7{X) = (t't(X). 

Then the following assertions hold. 

(i) (a, a) is a solution at V and (a, a') is a solution at V. 

(ii) aa{W) = aa'(W'). 

(Hi) ||a, (t|| > ||a, cr'|| . 

(iv) If there is some X with t{X) G X'*aX'* where a G B and a{a) 1, then 
||a,cr|| > ||a,cr'||. 

Proof. (i) Since a' is a B-solution at V we have 

a{W) = a'riW) = a'{T(W)) = a'rfW) = = cr(TF). 

By hypothesis, a{a) ^ 1 for all a G B. Hence, {a, a) is a solution at V. 
Since M{B) = M{B,%,9,p) = M{B,%,9', p'), we have (a,cr') is a solution 
at V. 

(ii) The assertion aafW) = au'fW'') is trivial since W = t{W), a = crV. 

(hi) For each X write t{X) as a word 

t{X) = xx,i ■ --Xx.ix 

with xx,i G Byj X'. Since every X' G X' appears somewhere in t{W) (by 
Definition [5K5)) we obtain: T' C |J {xx,i \ X G X /\ \ < i < Ix} ■ Hence 


(9) 

l|a,cr|| = 





(10) 

= E 

\aa'{xx,i---xx,ix)\= E h(^'ixx,i)\ 



x&x,i<i<ex 

(11) 

^ E 

\aa'iX')\ = \\a,a'\\. 


X'^X' 


(iv) If there is some X with t{X) G X'*aX'* where a G B and a{a) yf 1, then 
some Xx,i = a ^ X' with aa'{a) = a(a) yf 1. Hence, \a(j'{xx,i)\ > 1; and 
the > in (fTTll becomes the inequality >. 

□ 

In the second lemma we consider the morphisms which leave all variables invari¬ 
ant, and conclude that such a morphism does not change the weight of a solution. 

Lemma 15. Let V = (W, B, X, 9, p) and V = {W, B', X ,9', p') he extended 
equations, h : M{B', X ,9', p') —>■ M{B,X,9,p) he an (A U X)-morphism, and 
a : M{B) -G M{A,%,%, pf) he an A-morphism where M{B) = M{B,%,9,p) such 
that the following conditions are satisfied. 

, W = h(W'). 
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• a{a) 7 ^ 1 for all a & B. 

• If X ^9, then h{a') 1 for all a' € B'. 

• If 0{X) = c G B for some X G X, then c G B', 9'{X) = c, and h{c) G c*. 

Given a B'-solution o' at V, define a B-morphism a : M{B,X,9,p,) M{B) 
by cr(X) = ha'(X). Then (a, cr) is a solution at V and {ah, a') is a solution at V'. 
Moreover, aafW) = aha'iW') and 

||a,cr|| = \\ah,a'\\ . 

Proof. By definition, pih = p! and /ioo = /i. Hence {ah, a') is a solution at V'. 
Now, h{X) = X for all X G X. Hence, a{h{X)) = a{X) = ha'{X). For b' G B' 
we obtain ah{h') = h{h') = ha'{b') since a' and a are the identity on B' and B 
respectively. It follows that ah = hu' and hence, aa{W) = ah(T'{W'). Next, 

a{W) = cr{h{W')) = h{a'{W')) = h{a\W)) = a{h{W)) = a(h{W^) = a{W). 

Moreover, ii X G X and 9{X) is defined, then 9{X) = 9'{X) = c G B C\ B', and 
h{c) G c* by hypothesis. Hence, G c* and therefore cr{X) = ha'{X) G c*, 

too. Thus, cr is a H-solution at V and, consequently, {a, a) a solution at V. Finally, 
since a{X) = ha'{X) we obtain 

l|a>cr|| = 51 = 51 Wha'{X)\ = \\ah,a'\\ . 

x&x xex 

□ 

During the process of finding a solution, the parameters W, B, X, 9, p change. We 
describe the possible changes in terms of a directed graph, which will be converted 
into an NFA. 

3.6. The NFA IF and the trimmed NFA A. We are ready to define the NFA 
A mentioned in Theorem [I] in the case where M(A) = A* is a free monoid with 
involution. 

3.6.1. States. We start by building an NFA F whose states are all the extended 
equations {W,B,X,9,p) according to Definition [S] We will later obtain A by 
trimming, that is, by removing all states which are not on accepting paths. Thus, 
the only difference between F and A is that A doesn’t have superfluous states. 

Lemma 16. An extended equation V = {W, B, X,9, fj.) can be specified using at 
most 0{n\ogn) bits, so F has not more than singly exponentially many states. 

Proof. We claim that each component of V can be specified using 0(|r|) = 0{n) 
letters from F plus a finite alphabet. Since |F| G 0{n), we can encode each letter 
in F plus the finite alphabet as a binary number of length at most O{logn) bits. 
Thus V can be encoded by a binary string of length in C>(n log n). It follows that 
the total number of extended equations is at most 

To establish the claim, notice that W G T* with |IF| < 204n, B Li X C F, 
0 C F X F and |0| < |H U X\. Since fi : B Li X — > Ap and Ap is finite, fi can 
be encoded as a list {{c,p,{c)) \ c G B Li X}, using letters from F plus the finite 
alphabet Ap. □ 
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Initial states. An initial state is any state of the form (Winit, A, Amit, 0, Mimt), where 

Minit ■ (A U t 

is a morphism extending /io such that /iinit(-A) ^ 0 for all X G Ai„it. 

If {a, a) is a solution of (Winit, A, Ainit, 0,/Unit)) then necessarily a = id^* since 
a leaves the letters from A invariant. Moreover, we know that /iinit(-A) = iJ,oa{X). 
This means that the initial value of limit{X) tells us whether a{X) = 1; and if 
cr{X) ^ 1, then iimit{X) = {a,b) and cr(A') G oA* n A*6. Hence, iiinit{X) specifies 
the first and last letters of the reduced word (7{X) whenever a{X) ^ 1. Moreover, 
Aiinit(Ar) 7 ^ 0 implies aa{X) G F. Hence, aa(X) is a reduced word in Aj.. 

Final states. We choose and fix “distinguished” letters ci,..., Cm G C'\ A such that 
Ci ^ Cj ^ci for all i ^ j. We say that a state {W, B, 0,0, fi) is final if 

( 1 ) w = W, 

(2) The word W has a prefix of the form #ci# • • • #Cm#. 

Every final state has the unique H-solution a = ids because final states don’t 
have any variables. 

Remark 17. The names initial and final refer to the phase in the construction of 
the graph at which a state is produced, rather than being start or accept states for 
the NFA. That is, when we obtain the EDTOL language characterization, the start 
states of the NEA recognising the rational language of endomorphisms correspond 
to the final states defined here, and the accept states correspond to the initial states. 

3.7. Transitions. We define two different forms of transitions, based on substi¬ 
tutions and compressions. Both forms are labeled by an endomorphism of C* 
which induces a morphism between partially commutative monoids n) 

and M{B',%,e',iJL'). 

The direction of each transition is opposite to that of the morphism labelling 
the transition. Suppose we have a path p from an initial to a final state. A very 
important (and, perhaps, initially counterintuitive) fact is that in order to produce 
solutions, our algorithm follows the path p backwards, that is, from the final to 
the initial state; we compose the morphisms labeling the transformations in such a 
directed path p from the last edge to the first one, in order to produce the solutions. 
This is in agreement with our initial and final states being accept and start states 
in the NFA, respectively. 

3.7.1. Substitutions. A substitution transition transforms the variables and does 
not affect the constants. Let V = {W,B,X,0,ii) and V = fW',B,X',9', p') be 
states in F sharing the same set of constants B; and assume that V is not final and 
that V is not an initial state. Moreover, let 9{b) = 9'{b)^ and p{b) = p'{b) for all 
b&B. Therefore M{B) = M{B, 0, 9, p) = M{B, 0, 9', p'). 

Let T : M{B,X,9,p) —>■ M{B, X',9’, p') be any H-morphism such that t{W) = 
W , T modifies only X and X for some variable X, leaves all a: G (H U A) \ {X, Ai} 
invariant, and 

t{X) G (B U X')* with |r(A:)| < 3. 

Furthermore, we only allow the following choices for t{X), A and A': 

(i) t(A:) = 1 and A'= A\ {A, A}. 

(ii) t(A) = uX and X' = X with u G B* and 1 < |u| < 2. 

(iii) t(A) = cA'A and A = A' \ {A', A'} with c € B and 9'{X') = c. 
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In each of these three cases we define the substitution transition: 

= (W, B, X, e, ^i) ^ {t{W),B, X', 0', ^x') = V. 

Here, the label e denotes the identity morphism idc*, it restricts to the identity 
morphism from M{B, 0, 6', fj!) to M{B, 0, 6, p,), and it will be applied in the opposite 
direction from r and the transition. Note that after having performed a substitution 
transition we have ||F'|| < ||F|| if and only if r is defined by r(X) = 1 for some X. 

3.7.2. Compressions. A compression transition affects the constants, but does not 
change the variables. Let V = (W, B, X, 9, p) and V = {W, B', X, 9', p') be states 
in T sharing the same set of variables X and assume V is not a final state, 9(X) = 
9'(X) and p(X) = p'[X) for all X € X. 

Let h : M{B',X,9', p') —>■ M{B,X,9,p) be any [A U A’)-morphism such that 
W = h(W') and 

(1) if V' is non-final, then 1 < |h(c)| < 2 for all c G B', 

(2) if V is final, then I^(c)I ^ 1^1- 

In case that either ||F|| > ||F'|| or is final and h id^*, we define a compres¬ 
sion transition in J- by 

y = {h{W'), B, A, 9, p) A (IL^', B\ X, 9', p') = V, 

where the transition label h is given by an endomorphism h G End(C*) which 
induces the morphism h : M{B', X ,9', p') —>■ M{B,X,9^p) and which leaves all 
letters not in B' invariant. The direction of the morphism h is again opposite to 
that of the transition. 

Remark 18. The reason that we have to treat transitions to final states differ¬ 
ently is twofold. First, the coexistence of “singular” and “nonsingular” solutions is 
possible. In the singular case we have cr(A) = 1 for some X and in the nonsingu¬ 
lar case we have (7{X) ^ 1 for all X. Say there are solutions a and a' such that 
cr{Xi) = 1 and <j'(Xi) = a G A±. Then for some h,h' G L{A) and some ci we 
must have h{ci) = 1 and h!{ci) = a. Thus in transformations to a final state we 
must allow that h maps some letters to the empty word. In all other situations this 
is forbidden. Thus, if V —^ V is a compression transition and V is final, then we 
allow ||F|| < ||F'||. 

Second, if a state V = {W,B,^,9, p) has no variables, then W has prefix 
with Ui G C*. In this case we wish to allow a compression transi¬ 
tion /i to a final state in one step. By imposing the condition |h(c)| < |IT| 

we make sure the specification of h fits into our linear space bound, which is crucial 
in our complexity analysis below. 

Example 19. Let U = aX and V = aaab be an equation, for the purposes of 
demonstrating how the graph or NFA works. We have 

IFinit = #X#aX#aab#Xa4fbaa#X#. 

A path from initial to final states in the graph T for this equation is shown in 
Figure El where for simplicity we label states by a prefix of W in each extended 
equation. 
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Figure 2 . A path in T from initial to final state for the equation 
aX = aaab. The solution cr{X) is obtained by applying the maps 
hi,h 2 ,h 3 ,h 4 ,h 5 to Cl in reverse order, that is, cr(X) = 
hxh2h3hih^{c\). 


The first four transitions are substitutions ti{X) = T 2 iX) = aX,T 3 {X) = 
bX,T 4 {X) = 1 so hi, h 2 , hs, h 4 are just idc*, and the map /i 5 (ci) = aab is a com¬ 
pression to a final state. A solution for X can be obtained by applying the maps 
to Cl in reverse order to the path labelling, so we get <j{X) = /ii/i 2 ^ 3 ^ 4 ^ 5 (ci) = 
= aa6. 

3.8. Proof that the NFA is constructed in quasi-linear space. We can now 

give the algorithm to construct the trim NFA A in NSPACE(nlogn). We first give 
an algorithm to construct X, then use this to construct A. 

Lemma 20. Given a tuple V = {W, B, X,9, fj,), where W € T*, B C C, X C G., 9 
is a type, and /i : {B{dX) ^ N is a mapping, we can check within NSPACE(nlogn) 
whether V is an extended equation (that is, V is a state in J-) and furthermore 
decide whether the state V is initial or final. 

Proof. As noted in Lemma [T^ writing down any extended equation requires at 
most O(nlogn) bits, so if V requires more space we reject it as a valid input. If V 
fits into the allowed space, then go through the conditions listed in Definition [5] It 
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is obvious how to check the first five conditions. For example, if \W\ > 204n, then 
we reject immediately. 

The most involved test is to see that for every factor u of every Ui with the 
interpretation Ui S M(r, 0) the element u also appears in W G M(T,9). For 
this test we invoke the algorithm that solves the uniform factor problem in free 
partially commutative monoids as explained in Subsection 11.21 Recall that the 
uniform factor problem refers to an input of the form (F, ui). In our case the 
input has the specific form (F, 9,u, W). We presented a nondeterministic algorithm 
using linear space in the input size, where the input size of a tuple (F, 9, u, w) 
is (|F| + |0| + |uit;|) log |r|, as we need (!l(log|r|) bits to encode letters. Since 
(|r| + \9\ + |uw|) log |F| S O(nlogn), the call of such a subroutine fits into our space 
bound. 

Having completed the check that H is a state of J-, it is easy to check whether it 
is initial {W = Winit, B = A, 9 = ^) or final {W = W, 9 = 0, A = 0); since 9 = 0 
in both cases we are just checking W = Winit, W = W in a free monoid. □ 

In the following, when we say that V = (W, B, X, 9, p.) is a state in J^, this means 
V is given as a tuple for which the syntax check according to Lemma [201 that V is 
indeed a state was performed. 

Lemma 21. Given states V = {W,B,X,9,n), V = {W, B', X',9', in F, and 
a mapping h : B' ^ B*, we can check within NSPACE(nlogn) whether the triple 
(y, V', h) encodes an transition V —^ V in the graph F. 

Proof. We assume h is specified as a tuple requiring at most 0{n log n) bits. In order 
to check whether V —^ Y' is a compression transition we must have h ^ ids* and 
then we go through the conditions of Subsection 13.81 most of which are immediate 
to verify. Among these, we have to compute h{W') as a word in {B U X)* and 
then see if W = h{W') G M{B U A, 9). The test W = h{W') G M{B U A, 9) is a 
special case of the uniform factor problem in free partially commutative monoids, 
as already discussed in the proof of Lemma [201 

For a substitution transition, a necessary condition is B = B' and h = ids, 
which is trivial to check. Next we guess some mapping r : A —>■ (H U X')* with 
|''■(A)| < 3 for all X G X. Just as above we check r(W) = W G M{B' U X',9') 
and the other requirements for substitutions listed in Subsection 13.7.11 □ 

As usual in automata theory we modify the NFA F by removing all states which 
are not on a path from some initial to some final state. If there is no such path, 
then L{F) is the empty set. The resulting NFA will be denoted as A. We have 
L{A) = L[F). Moreover, L{A) = 0 if and only if the automaton A is empty. 

The key tool used to build the trim NFA A is Ispath(1/, V'), which we define to 
be a Boolean predicate that yields true if and only if there is a path from state V 
to V' in the graph A. 

Lemma 22. Let V, Y' represent two states in the graph F. Then the predicate 
ISPATH(y, y') can be evaluated in NSPACE(nlogn). 

Proof. Define the language Ljr = {(y, y') | ISPATH(y, y') = true}. On input 
(y, y') we can guess a path Y = Yo,Yi,hi,Y2,h2, - ■ ■ lY' = Yk,hk in F from Y 
to Y' and check for each i whether {Yi-i,Yi,hi) encodes a transition by using 
Lemmas |20| and |2TJ Thus, Ljf G NSPACE(nlogn). 
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Since NSPACE(nlogn) is closed under complementation by Immerman and Szelepcsenyi 
(see m Theorem 7.6]), we also have 

Ljr = {(!/, V') I ^ a path from V to V' in T} G NSPACE(nlogn). 

Thus, the predicate Ispath)!/, V') can be evaluated in NSPACE(nlogn) by running 
two procedures simultaneously to determine if (V, V) G Ljr or {V, V) G Tjf. □ 

Proposition 23. We can construct the trim NFA A in NSPACE(nlogn). Within 
the same space complexity we can decide whether A is empty, or whether A contains 
a directed cycle. 

Proof. For each V that is a state of J-' output V as an initial node of A if both (1) 

V is initial in T, and (2) there exists some path to a Hnal state in if. We check (1) 
using Lemma [201 For (2) we run through all final states V of JF and evaluate the 
predicate Ispath(P, F'). If at some point Ispath(F, F') becomes true, we output 
F as an initial node in A. If no initial node in A is found, then we stop; the output 
is .A = 0. Hence, we continue only if there is at least one initial node. 

Next, we construct all transitions of A as follows. We list all triples (F, V, h) 

where F V is a transition in F. For each such triple we consider all states Fq 
of A which are initial, and for each Vq we evaluate Ispath(Vo, F). If no such Fq is 
found where ISPATH(Vb, F) is true, then we move to the next triple (F, F', h). If at 
least one such Fq exists, we list all states Vf of F which are Hnal. For each Vf we 
evaluate Ispath(F', Vf). If no such F/ is found where Ispath(F', F/) is true, then 
we move to the next triple (F, F', h). Otherwise we output (F, F', h) as a transition 
of A. If, moreover, V is final in F, then we mark that transition in order to indicate 
that V' is final in A, too. We then move to the next triple (F, V',h). 

Having these two lists at hand we have constructed the trim NFA A. 

Finally, to check for a directed cycle we enumerate all pairs (F, F') gAxA with 

V ^ V' and for each pair evaluate Ispath(F, F') and Ispath(F', F). □ 

With the assertion in Proposition |23] the algorithmic part of the proof of the 
monoid version of Theorem HI is finished. It remains to show the soundness and 
completeness of the construction. This requires purely existential statements, where 
no reference to effectiveness is necessary. 

3.9. Soundness. In this section we prove soundness, that is, any output we obtain 
by following the transitions in the NFA A from an initial to a final state, and then 
applying the corresponding maps in reverse order to the distinguished letters, gives 
a correct solution to the equation Winit- 

Recall that we have chosen distinguished letters Ci,..., G C, and that if 
(W, B, 0,0, p.) is a final state, then W = W and W G #ci# • • ■ 

Proposition 24. Let Vb Vt be a path in A of length t, where Fq = 

(PFinit, A, Ainit, 0, /Tinit) is an initial and Vt = (IF, B, 0,0, is a final state. Then 
Fq has a solution (id^*, a) with cr(IFinit) = hi ■ ■ ■ ht{W). Moreover, for 1 <i <m 
we have 

a{Xi) = hi - ■■ ht{ci). 

Proof. Let s > 0 and Fq Vg be any path to some state Vg = {Wg,B, X, 9, fi) 

such that CTs is a B-solution at Vg. We claim that Vq and Vg have solutions (id^*, cr) 
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and {idA»hi ■ ■ ■ hs,as), respectively, with 

(12) aiWinit) = hi---hsas{Ws). 

Claim (fT^ is trivial for s = 0 and for s > 0 it follows by induction using Lemma fTKl 
or Lemma [CT depending on whether hs is a substitution transition or a compression 
transition. Now for s = t we have W = W hy the definition of a final state. 
Since no variables occur in W, at = id^. is the (unique) S-solution of W, so 
CT(Winit) = hi ■ ■■ht{W). 

By definition is a prefix of Winit and #ci# • • • #Cm# is a prefix 

of W for the final state Vt, but h = id^-fii • • • fi* is an yl-morphism from B* to A* 
with \h{c)\^ = 0 for all c € B. This implies 

a(#Xi# • • • #X^#) = /i(#ci# • • • #c„#). 

In particular, a{Xi) = hi ■ ■ ■ ht{ci) for 1 < i < m. □ 

Using the notation of Theorem[3]we have shown soundness, that is, every output 
we obtain is a solution in reduced words. 

Corollary 25. The following inclusion holds: 

{{h{ci), . . . , h{Cm)) €C* X---XC* \ hG L{A)} c 

IJ {{a{Xi ),. . . , a{Xm)) G F"* I cr G E A cr(Winit) = cr(Winit) A fj. = ^J-oa}, 
{mIm(a)/o} 

where S denotes the set of C-morphisms cr : L* —> C*. 

Proof. Follows from Proposition [Ml □ 

Corollary 26. If the NFA A is nonempty, then there is some solution a which maps 
all variables Xi to reduced words in A^ and which satisfies cr(Winit) = tL(lUinit)- 
If the NFA A contains a directed cycle, then there are infinitely many such a. 

Proof. The first part follows from Proposition (Ml 

Now assume that A contains a directed cycle. Then for every to G N we can 
choose a path Vq Vt from an initial state Vq to some final state Vt with 

t > to- For each 0 < s < t define as = id^-fii • ■ - hg. Thus, oq = id^*. We view 
as G End(C'*), and let (as,(Ts) be the corresponding solution at 14, which exists 
due to (fT^ . 

For every transition 14 -1 U which is defined either by a compression, or 

by a substitution of type (i), we have ||14-i|| > ||U||- Since ||U|| G 0{n*) for all 
states, there is a constant n' such that every path of length n'n'^ must include a 
substitution of type (ii) or (iii). Thus, we may assume that for a large enough t 

there are more than tp transitions where 14_i V) is defined by a substitution 
of type (ii) or (iii), i.e. with t{X) G r*CF*. 

By the definition of A we have as{c) ^ 1 for all c G C whenever s < t. (The 
final transition is an exception.) By Lemma [Ml and Lemma [T51 we have 

||tt 0 , CToll > to- 

since for each compression transition the weight is unchanged, and for each substi¬ 
tution the weight decreases, and in particular, it decreases strictly at least to times. 
The result follows since ap = id^». Hence, there infinitely many solutions CTp. □ 




SOLUTION SETS FOR EQUATIONS OVER FREE GROUPS ARE EDTOL LANGUAGES 21 


3.10. Completeness. Now we show that every solution of the equation Winit can 
be obtained from A. 

Let us fix some state V = {W, B, X, 0, /i) and assume that V has a solution (a, a). 

We will show that if V is “small enough”, then A contains a path V Vi ■ ■ ■ 

Vt to some final state Vt = {W, B', 0, 0, ^') such that a{W) = hi ■ ■ ■ ht(W'). Let 
us make precise what “small” means. 

Definition 27. A state V = {W,B,X,^,fj,) is called small if 

|W| <96n + 6|Winit|. 

Clearly every initial state is small. Final states need not be small. 

3.10.1. Forward property of transitions. The existence of a path Vi • • •Vt 

to some final state Vt = {W, B', 0, 0, p,') such that a{W) = hi - ■ ■ /it (IV') relies on 
the following technical concept. 

Definition 28. Let V = (IV, B, V, 0, p) (IV', B', V', 0', p') = V' be a transition 

in A and (a, a) be a solution at V. We say that the triple (V V, a, cr) satisfies 
the forward property if there exists a solution (ah, a') at V' such that 

aa-(W) = aha'{W'). 

By a slight abuse of language: if V —^ V is a transition in A and the solution 
(a, a) at the source V is clear from the context, then we say also that the transition 

V —^ V satisfies the forward property. In particular, if we follow a path from 

V having a solution (a, a) to some state V = (W, B',9,6', p') by transitions 
satisfying the forward property, then V has some solution. But as V uses no 
variables, we obtain IV' = IV'. 

Lemma 29. Let V = (W,B,X,6,p) —^ (t(W), B, X',6', p') = V be a substitu¬ 
tion transition (according to Subsection \S. 1. 1]) and 9{Y) = 9'(Y) for all Y G XnX'. 
In each of the following cases (V V',a,a) satisfies the forward property: 

(1) o'{X) = 1 and the transition V —^ V' removes X by t(X) = 1; 

(2) 0 = 0, o-(X) = av, p'(X) = p(v), and the transition V —^ V is defined by 
t(X) = aX; 

(3) 9{X) = 0, a(X) = CUV, u € c*, p'(X') = p{u), p'{X) = p(v), and the 
transition V —^ V' is defined by t{X) = cX'X with 9'{X') = c; 

(4) 9(X) = c, <j(X) = cu, p'(X) = p(u), and the transition V —^ V substi¬ 
tutes X by t{X) = cX. 

Proof. Let V —^ V be defined by r : M(B,X,9,p) —>■ M(B,X',9',p'). It is 
enough to show that V has a i3-solution with a = cr'r. 

(1) Let cr' be the restriction of cr to X' = X \ {X, A"}. Then we have cr = cr'r. 

(2) Recall that by definition of a substitution transitions, we have 0' = 0, 
too. Define cr' by cr'(X) = v and cr'(V) = cr(V) for Y ^ X,X. Since 
p'(X) = p(v), we obtain cr' as a morphism; and we have cr = cr'r. 

(3) Define cr'(X') = u, cr'(X) = v and cr'(V) = cr(V) for Y ^ X',X',X,X. 
Then we have cr = cr'r. 

(4) Define cr'(X) = u and cr'iY) = cr(V) for Y ^ X,X. Since 9(X) = c and cr 
is a solution, we have u G c* and as r is a morphism we have 9'(X) = c, 
too. Then we have cr = cr'r. 
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In all cases it is clear that a' is a B-solution. □ 

Lemma 30. Let B' CB and V = {h{W'), B, A”, 9, n) {W, B', A”, 9', n') = V 
he a compression transition (according to Suhsection \S . If a : X ^ M(_B,0,0,/r) 
factors through morphisms as 

a:X^ M{B', 0, 9', n') A M{B, 0, 9, p) 
such that a'{X) G c* whenever 9'{X) = c, then {ah, a') is a solution at V' and 
{V —^ V, a, a) satisfies the forward property. 

Proof. We have ah = ha' and hence, aa{W) = aha'(W'). □ 


Frequently, we cannot apply Lemma [301 because a cannot be written as ha'. 
The typical example is that B' ^ B, but some a{X) uses a letter from B \ B', and 
h{a) = a for all a G B'. This type of “alphabet reduction”, switching from a larger 
alphabet B to some proper subset B', is needed only if the type relations 9,9' are 
empty. Therefore the following lemma applies in this situation. 

Lemma 31. Let B' B and V = {W, B, X, 0, p) {W, B', X, 0, p') = V' be a 
compression transition which is induced by the identity idc* ■ Thus, e becomes the 
canonical inclusion of M{B'p') into M{B,tl),%, p). In particular, W = W 
and p' is the restriction of p. 

Let {a, a) he a solution at V. Define a B'-morphism (3 : M{B,%,%, p) —)■ 
M{B'p') by (i{b) = a{b) for b G B \ B' and (3{b) = b for b G B'. Let 
a'{X) = /3a{X). Then {ae,a') is a solution at V' with aa{W) = a£a'(W'). In 
particular, {V V,a,a) satisfies the forward property. 


Proof. Since a : M{B,%,%, p) M{A,%,%, pq) is an yl-morphism with p{a) = 
Po{a) for all a G A, we have p(3{b) = pa{b) = poa{b) = p{b) for aAh G B\B' and 
/3 is indeed a B'-morphism from M{B,%,ib, p) to M{B'p'). 

Note that M{B', X,%, p') is a submonoid of M{B,X,$,p) and e realizes the 
inclusion of these free monoids. Hence W = e(VF') = W as words. In particular, 
a{W) = a{W) implies a'{W') = a'{w'). Thus, (ae,cr') solves V. 

Finally, by definition of j3 we have a = a/3 because a is an A-morphism. Hence 
a = ae(3 and we obtain 

a£a'(W') = a£a'(W) = a£l3a{W) = aa{W). 


□ 


Definition 32. Let cr : F —)• C* be any C-morphism and W G F*. The word W 
is realized as a sequence of positions, say 1, 2,..., |IF|, and each position is labeled 
by a letter from F. If IF = uqXiUi • • ■ XmUm, with Ui G C* and Xi G fl, then we 
have ct(IF) = uoa{xi)ui ■ ■ • a{xm)um- The positions in a{W) corresponding to the 
positions of the ufs are henceforth called visible. 

Given w = tT(IF), each visible position in w can be uniquely identified with a 
position in IF, both positions having the same label in C. Following a path satisfy¬ 
ing the forward property makes the length of the equation oscillate. In particular, 
thoughout the compression method below the algorithm progresses from small state 
to small state, but in between the states are not necessarily small. 

Proposition [33] shows that every solution can be found by tracing a path in A. 
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Proposition 33. Let V = (W, B, Xbe small and let {a, a) be a solution at 

V. Then A contains a path V Vi ■ ■ ■ Vt to some final state Vt of transitions 
satisfying the forward property. 

In particular, if V is an initial state, then we have a(Xi) = hi ■■■ htici) for all 
1 < i < m, where ci,... ,Cm are the distinguished letters. 

3.10.2. Reduction of Proposition^^ to Lemma\^ As a base case we let X = %■. 
thus, V = If V is final, then there is nothing to do. Otherwise, 

by definition of an extended equation, we have W G #5*# and \W\^ = |lbinit|^- 
Since A = 0, we have (a, <j) = (a, id^*) and we can write 

bb = ffuiff • ■ * ' ‘ ' ffuiff. 

Define Bi = Au{ci,ci,..., Cm+ 2 , Cm+ 2 } as a disjoint union where ci,... ,Cm are 
the distinguished letters. Define Vi = (IDi, i?i, 0,0,/ri) with 

bbl — * * * fj^a.jjiffc.,rn+lffo.,rn+2ffajn+2ffajn+lfl^Cmff' * * * ffo-iff. 

Defining yii{ci) = /i(iti) and hi{ci) = Ui yields the desired result. Clearly, (o/ii, ids*) 

is a solution at the final state Vi and the compression transition V Vi satisfies 
the forward property. (Note that we could have some Mi = 1, so this is where the 
case distinction discussed in Remark [T51 is needed.) 

The proof of Proposition [321 is by induction on the weight ||a,(7, y||. It covers 
the rest of this section. Throughout the proof, all transitions satisfy the forward 
property by Lemma 1291 Lemma 1301 and Lemma 1311 therefore, if we know that 
Vi = (Wi,Bi,Xi,0i,pLi) has a i?i-solution ai for all 1 < f < s, where s is some 
positive integer, then we obtain tT(IT) = hi - ■ ■ hsCTsiWg) by Definition 1351 

Preprocessing. By the base case we may henceforth assume that X %. If we 
have cr(X) = I for some variable, then we follow a substitution transition removing 
the variable; and we are done by induction on the weight. 

Thus, without restriction, we can assume cr{X) 1 for all variables. For each 
X G X, if a(X) G aB* we follow a substitution transition defined by t{X) = aX. 
This has the effect of popping out constants at the start and end of each variable, 
since each X comes with its involution X. Since W has at most An variables 
present, the length of W increases by at most 8n and the weight ||a(T|| decreases. 
In case that this substitution leads to a situation where a solution maps X to the 
empty word, we remove X and X. After that we are done by induction on the 
weight (since ||a(T|| is the dominant term in the lexicographic ordering), unless we 
end with |t(IT)| > 96m + 6 |Winit|, that is, the new state is not small. In that case 
we will have 96m + 6 |Winit| < |t(IT)| < 104m + 6 |Winit|- Thus, in proving a more 
general statement, we will not assume that V is small, but that 

96n + 6|ITinit| < |bF| < 104 m + 6 |ITinit| ■ 

So far, we did not discuss the size of B. Assume that we are in the situation of 
Lemma I3T] there is B' with Af-B'(f.B such that W G (B'UX)*, then we can use 
Lemma Ell and we are done by induction on the weight. Thus, after preprocessing 
we may assume that all letters in R \ A appear in W, that is, \W\f, > 1 for all 
bGB\A. 
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During the preprocessing we decreased the weight, but at the end of this phase V 
may no longer be small. Therefore, the proof of Proposition [33] reduces to showing 
the following lemma. 

Lemma 34. Let V = (W, B, A^,0, fi) be a state with a solution {a, a) such that 
X ^0 and \W\ < 104n + 6 |Winit|- Then A contains a path of transitions satisfying 
the forward property to some small state V = {W, B', X',0, pt') with a solution 
{a', a') such that ||a,(T, P|| > ||q;', cr', P'||. 

3.10.3. Proof of Lemma The assertion of the lemma is trivial, if V is small. 
That is: \W\ < 96n + 6 |Winit|- Hence, we may assume 96n + 6 |Winit| < |H^| < 
104n + 6|Winit|. Let V = {W, B, X,0, pi) be a state with a fixed solution {a, a) 
satisfying the hypothesis of Lemma EH We describe a way to find a path through 
A in terms of a procedure which “knows” the solution (a, a). 

Block compression. We employ block compression only if W contains a factor b^, 
where b G B and b ^ ff. Otherwise we move straight to the next procedure, called 
pair compression. During the procedure we will increase the length of W by 0(n), 
but at the end we will arrive at an equation where \ W'\ < |kP|; and importantly, W 
will not contain any proper factor with b G B and b ^ ff. We give an example 
of this procedure in Section E] 

Remark 35. While this procedure is technical, the idea is quite simple. The goal 
is to eliminate long blocks that are visible in the equation. To do so we use 
transitions which replace bb by 6, just two letters at a time. Before we can apply 
such a compression, we must ensure the length of any maximal block b^ with at 
least part of the block visible must be even. So first we follow various substitution 
and compression transitions to arrange this. 

(1) Recording the constants with large exponents. Due to the previous 
substitutions X H- bX in the preprocessing step, we have that for each X 
if bX < W and b'X < W are factors with b,b' G B, then ff ^ b = b'. For 
each b G B\{ff} define two sets: 

Afc = {a > 2 I 3db^e < <j{W) : d ^ b e and some b in db^e is visible} , 

Xb = {X GX\ bX <W a a{X) G bB*} . 

Note that 

(13) ^|A,| + |T’b|<|lF|. 

b 

By Definition [5] we have Af, = A^. Another fact is crucial: it might be 
that there are X G X \ Xb with a{X) G bB*, but then to the left of every 
occurrence of X there is (the same) letter &' G 5 \ {#, 6, 6}. In this case 
the block compression procedure does not touch the variable X (although it 
may change a{X)). If, on the other hand, X G Xt, then a factor bb crosses 
the left border for every occurrence of X. The first b in such a factor is 
visible in W, the second one is not. 

(2) Introducing the type and renaming of some constants. For each 
b G B with Ab 7 ^ 0 we introduce a fresh letter ct G C\B with ^(ct) = pi{b). 
In addition, for each X G Ab introduce a fresh letter CA,b with pi{c\^b) = h-ib)- 
The fresh letters are chosen such that Cb = cj^ and cXfi = c^p^. Note that 
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cx^b and Cb are just names for formal symbols realized by fresh letters in 
the fixed extended alphabet C. 

We let B' = B\j[J {cb,Cb, CA,h,CA,6 | X £ Ab Ab € B} and we introduce a 
type by 9(c\^b) = Cb for all X £ Ab- This yields a free partially commutative 
monoid M{B\ X,9, fx). We define an T-morphism 

h : M{B', T”, 9, fx) -£■ M{B, T”, 0, fx) 

by h{cx,b) = h{cb) = b. Next, we modify W: in every factor db^e of a{W) 
with d ^ b ^ e and X £ Ab we replace that factor by dc^e. This defines a 
new word W such that h{W') = W. Note that so far, no cx,b does appear 
in W'. Let V = {W,B',X,9,ix). Then V' is a state and we can follow 
the transition V V. We have ||F'|| < ||F|| since 9 ^ % and this term 
appears before the number of constants in the weight of a state. (It might 
be that all b are gone, so we cannot make sure that the second component 
in the weight decreased.) Note that for each A G A at least one position 
labeled by Cb is visible in W. 

We rename V' = {W, B', X, 9, fx) asV = (IT, B, X, 9, fx) and rename the 
solution as (a, a). 

(3) Splitting the variables starting with special constants. We skip this 

step \i Xb — % for all b. Otherwise, for each b £ B and X £ Xb we write 
a{X) = c^w for some ^>l with w ^ {b,Cb}B*. We split the variable X 
by defining t{X) = CbX'X where X' = \ A is a fresh variable, 

which is assigned a type 9'{X') = Cb- Moreover, we let /r'(Ar') = ^{cbY~^, 
lx'{X) = ix{w), cr'(Ar') = and <j'{X) = w. The new set of variables is 
a disjoint union 

A' = AU b£BAX£XbY 

We obtain a new state V = (t(IT), S, X', 9', ^') and a morphism 
T : M(B, A, 6», ^ M(B, A', 9', ^x'). 

The morphism r defines a substitution transition V —^ V which pops a 
letter. The new solution at V is (Q;,tT'). 

We rename V = (r(IT), B, X', 9', ^') as T = (IT, B, X, 9, fx) and rename 
the solution as {a, a). The next step introduces the letters cx,b into IT and 
ct(IT). 

(4) Identifying a position in each block dc^e. We represent W £ M{B, X, 9, /i) 
by any word in {B U X)*. For each letter Cb, we scan the word a{W) from 
left to right and stop at each occurrence of a factor dc^e where A G Af, and 

d ^ Cb Y At the stop we do the following. 

• If at least one of the c^’s in this block is visible in IT, then choose the 
left-most corresponding visible position in IT, and replace the label Cb 
at this visible position by ca, 6. In ct(IT), replace dc^e by dcxfiC^~^e. 

If no position of the Cb’s in this block is visible in IT, then we make 
no change. 

Thus, from left to right, we transform the word IT into an element IT' G 
M{B,X',9,Y) and simultaneously ct(IT) into an element ct'(IT') G M{B). 

We obtain a new state V' = {W',B,X,9,^) and we can follow the arc 
V —^ V where h is the A-morphism defined by a renaming h{c\^b) = Cb- 
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Note that ||y|| > ||F'|| since for each ca,6 a factor ca,6C6 appears in W, 
so there are more letters visible in W' than in W, which decreases the 
second component in the weight of an extended equation. At V we obtain 
a new solution {a,a')] and as usual, we rename V = {t{W),B,X',0' 
as y = {W, B, X, 6, fi) and rename the solution as {a, a). 

Due to partial commutation we have the following: if a factor / G 
d {cb, c\^bY 6 occurs in a{W) with d,e^ {cb, CA.b}, then we have £ = A G A^, 
and / = dc\^bC^~^e G M{B,<1>,6, ^). Moreover, if 9{X) = Cb, then X com¬ 
mutes with the letter Cb, but X does not commute with any CA.fc- 
(5) The block compression. As long as there exists a letter Cb which occurs 
in a(W), perform the following loop, which also finishes the block compres¬ 
sion. During the following loop we maintain the invariant: if dcA.fcC^e and 
d'cx^bcl e' are factors of a{W) with d ^ Cb ^ e and d' ^ Cb ^ e', then £ = £' 
and a{W) contains a factor dcXjiCb^e as well. During the loop we perform 
various times a renaming in order to keep the notation V and (a, cr) at the 
current states. Initially we define a list 

Ab = {b € B \ Ab^%} . 

while Ab 7^ 0 do 

(a) For some b G Ab remove b and b from A^; 

(b) Let c = Cb and for all A G Af, abbreviate ca, 6 as c\. 

(c) while |(T(iy)|^ > 1 do 

(i) For all X with 0(X) = c where |cr(Ar)| is odd, follow a substi¬ 
tution transition of type X 1 —>■ cX. Hence, we may assume that 
|(t(X)| is even for all X with 9(X) = c. 

(ii) Remove all X from X where cr(X) = 1. Observe, if there remains 
a variable X with 0(X) = c, then a(W) contains a factor c^. 

(iii) For all ca where a{W) contains a factor dc\c^e where d ^ e 
and I is odd, follow a compression transition with /i(ca) = cca. 
In order to see that this is possible observe that for every oc¬ 
currence of such a factor dc\c^e there are only two possibilities. 
Either none of the positions of c\c^ are visible in IF, or the po¬ 
sition of Ca is visible in IF. Moreover, c commutes with ca and 
with all X where 9{X) = c; and |cr(^)| is even for those X. 
Thus, wherever c\ is visible in IF, the factor cc\ is visible in 
IF G M{B,X,9,n). 

Still, we need to be more precise in order to guarantee a weight 
reduction. The A-morphism defined by h(c\) = cc\ leads to 
new element IF' G M{B,X,9,fi) and a new solution {ah, a'). 
In case that no letter c occurs in ct'(IF') anymore, the letter c 
and the type becomes useless. Thus, if |cr'(lF')|^ = 0, then we 
actually follow a compression transition 

V (W',B',X,9',fx) 

where B' = B \ {c,c} and hence \9'\ < |6l|. Nevertheless ||F|| > 
||F'|| since |1F'| < |1F| due to compression. 

(iv) If there exists a variable X with 9{X) = c, then we know a{X) = 
(?c^ where £ is even. We follow a substitution arc defined by 
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X i-A (?X in order to guarantee that a factor becomes visible 
in W. 

(v) Due to the previous steps: either we have c ^ B oi W contains 
a visible factor c^. In the first case, we skip this step. Thus, 
we assume that W contains a visible factor (?. Now, if a{W) 
contains a factor dc\c^e where d ^ c ^ e, then £ is even; and 
if 0{X) = c, then a(X) = and j is even, too. Thus we can 
follow a compression transition defined by h{c) = (?. This leads 
to a new equation W with h{W') = W and new solution a'(W') 
and the number of occurrences of c and c is halved. Note that 
||D|| > ||D'|| since W contains a factor c^. Hence, |1T| > \W'\. 
Rename the parameters to V,W,B,X,9,fj,,a,a. 
endwhile 

(d) Rename all c\ by ca,6. 

endwhile 


Space requirements for the block compression. Let us show that the block compres¬ 
sion can be realized inside A. 


Lemma 36 . Let V = {W, B, X, 0, /i) be the state after preproeessing, when we enter 
“bloek eompression”, and let V = {W, B', X', 0, pt') be the state at the end of bloek 
compression. Then V', as well as all intermediate states between V and V', are in 
A. Moreover, \W'\ < 104n-|-6 |Winit|- 


Proof. At the end of block compression we have X' C X, and each visible position 
of the new letter c\^b occupies a position where some letter b was visible in W. 
Thus, \W'\ < |IT| < 104n-h6|Winit|- 

To show that the procedure stays inside A we calculate the maximum length 
of an intermediate equation during the process. We start block compression with 
|1T| < 104n -I- 6 |lTinit|, and \X\ < 4n. In step (3) we add at most 8n new variables 
X' and at most 8n constants (we may substitute a variable X by aX'XX"b in the 
case that a{X) = a^wb^ ). So the length of the intermediate equation at this step 
is at most I04n -I- 6 |Winit| + 16n = 120n -I- |TTinit|- The only other step of block 
compression that adds length to the equation during the inner while-loop in step 
(5). 

We start this loop with|lT| < 120n-|- |Winit| and with at most 8n typed variables 
(the variables that were added in step (3)). We perform the loop at step (5c) with 
one letter c € As fixed. 

In step (i) we pop at most one c letter for each typed variable, and in step (ii) 
we pop for each typed variable, so we add at most 3 ■ 8n = 24n c’s, and then in 
step (v) we halve the number of c’s, so overall we add at most I2n c's. We repeat 
this loop until all c’s are eliminated. In each iteration we add at most 24n new c 
letters, but then divide the total number of c letters by 2. If we just consider the 
number of new c letters added from the start of the while loop, we see that after 
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each iteration the number of new c letters remaining is at most: 


iteration 

number before 
step (i) 

number added 

number before 
step (v) 

number after 
step (v) 

I 

0 

24n 

24n 

I2n 

2 

I2n 

24n 

36n 

I8n 

3 

I8n 

24n 

42n 

2In 

4 

2In 

24n 

45n 

23n 

5 

23n 

24n 

48n 

24n 


Thus the total length of W is never more than 

(14) 120n + 6 llTinitl + 48n = 168n + 6 

Since this call of the inner while-loop eliminates all occurrences of the letter 
c, at the end of each call the length of W returns to being bounded above by 
120n -I- 6 |lTinit|, when we repeat the while-loop at step (5c) for another constant in 
Ab, until As = 0. Thus all states are in A. □ 

For the final state V = (IF', S', A", 0,/r') the type relation is empty. If V is 
small, that is, |1F'| < 96n + 6 |IFinit |, then Lemma [Ml is shown. Thus, without 
restriction we again have 

96n + 6|lFi„it| < IlF'l < 104n + 6 |lFi„it|. 

Pair compression. After block compression we run pair compression, following es¬ 
sentially the formulation of Jez’s original procedure |12j . We start a pair compres¬ 
sion at a state Vp = (W, B, X, 0, /i) where we have: 

• |kk|(, > 1 for all 6 S i? \ A. 

• 96n + 6|lFi„it| < |1F| < 104n + 6 |lFi„it|. 

• W doesn’t contain any proper factor with b G B\^. 

• The current solution is denoted by (a, cr). 

The goal of the process is to end at a state Vq = (IF", i?', F", 0,/i") with |1F"| < 
96n + 6 |Winit| by some path satisfying the forward property and without increasing 
the weight. Moreover, there will be no types in this phase. Note that the constraints 
make sure that (j{X) does not contain any factor aa, but we cannot rule out that 
IF contains such factors. However, the number of aa factors remains bounded by 
IWinitl, since they can only occur after preprocessing IFinit- 

Consider all partitions B \ {^} = LU R such that b G L b G R. Note that 
there is no overlap between factors ab, cd G LR unless ab = cd. Moreover 

ab G LR ba G LR. 

For each choice of {L, R) we count the number positions in IF where some factor 
ab G LR with a ^ b begins. We intend to compress all these factors into single 
letters. 

Remark 37. We choose and fix one of the partitions (L, R) such that the number 
of factors ab G LR in a{W) such that a ^ b and at least one of a or 6 visible is 
maximal. 

We say that ab G LR is crossing if IF contains either a factor aX with a{X) G 
bB* or a factor bX with a{X) G aB* (or both). In the first phase we run the 
following procedure. 
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Uncrossing. Create a list £ = {X £ X \ 3b £ R : a{X) £ bB*}. 

For each X £ C: 

• choose b £ R such that (j{X) £ bB* and follow a substitution transition 
X i-A bX. 

This concludes the “uncrossing”; and, as done previously we rename the parameters 
to V, W, B, X, fj,, a, a. 

Above, when we follow X n- bX with b £ R, then automatically X is replaced 
with Xb, and b £ L. We also have {AT, A"} C £ if and only if a{X) £ bB*a for 
some ab £ LR. In that case we actually substituted X by bXa and X by aXb. 
Recall that we have at most 4n variables in W. Thus, at this stage we have: 

(15) |IT| < 104n + 6 |ITinit| + 8n = 112n + 

The second phase begins with creating a list V = {ab £ LR \ a ^ b}. After that 
we run the following while-loop. 

while 7^ 0 do 

(1) Define 

B' = AU{a£ B \ |1T|^ > 1 V 3A: G A : cr(A:) G aB*} . 

If B' ^ B, then follow a substitution transition V (W, B', X, 0, fj,) where 
the label £ = idc* yields the inclusion of M(i3', 0,0,/i) into M(R,0, 0,/r). 
Rename the parameters to V, W, B, X, /i, a, cr. 

(2) Select and remove some pair ab in V. If ab does not occur as a factor in 
W, then do nothing, else perform the next steps. 

(3) Choose a fresh letter c = Cab G C \ B with ^(c) = /r(a5) and let B" = 
B U {c, c}. Define an A-morphism 

h : M{B'', A, 0, /i') M(R, A, 0, 

by h{c) = ab. 

(4) Replace in W all factors ab by c and all factors ba by c. Let W £ (B'UX)* 
be the new word and V = (IT', R", A, 0,/i') be the new state. We have 
W = h{W'); and hence there is a compression transition 

V -^4 V'. 

(5) Follow the compression transition V —^ V\ and rename the parameters 
to V, W, B, A, o, cr. 

endwhile 


Lemma 38. During the while-loop for pair compression the following properties 
hold. 

(1) After the first step, where the new alphabet B' is created (and then renamed 
as B) we have \B\ < |IA| -I- 2. 

(2) No factor ab £ LR ever becomes crossing. 

(3) At each step where we move from state V to V we have ||A|| > ||A'||. 

(4) Each transition satisfies the forward property. 

Proof. (1) In the first step inside the loop, when the new alphabet B' is created, 
we have \B'\ < |IA|. Therefore, after the hrst renaming, we have \B\ < 
|IA|. When we define B", we add two new letters. Hence, we obtain 
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\B"\ < |VF| + 2, which yields, after renaming, \B\ < |hF| + 2. This property 
persists during subsequent loops. 

(2) We have to show that no factor ab G LR ever becomes crossing. To see 
this, consider the alphabet reduction by following the transition V —^ 
(W, B', X, 0, p.) with B' 7 ^ B. It involves replacing every letter a G B \ B' 
by a (a) according to Lemma EU The potential problem is that we might 
have a G L, but a{a) starts with a letter in R, so we might create new LR 
factors. However as B' contains all letters a where <7{X) G aB* for some 
X, we never introduce any new crossing pairs. 

(3) The assertion ||F|| > ||F'|| is trivial. 

(4) The transition V (W, S', dl, 0, with B' ^ B satisfies the forward 

property by Lemma EH In order to see that V Y' satisfies the forward 
property when we have h(c) = ab we proceed as follows. As done for W, 
also replace in a{W) all factors ab by c and all factors ba by c. Since ab is 
not crossing, we find a H'-morphism 

a' : M{B', A, 0, fi')* ^ M{B', 0,0, 

such that a{X) = ha'{X) for all variables X. Thus, we obtain {ah, a') as 
a solution at V' 

□ 


Lemma 39. Let Vp = {W, B, Xhe a state in A with a solution {a, a) where 
96n + 6 iWinitl < \W\ < 104n + 6 iWinitl such that W doesn’t contain any faetor 
for ff ^ d G B. Let {L,R) he the partition with B \ {ff} = LU R according 
to the choice made in Remark\^ Then pair compression on Vp leads to a state 
Vq = {W'',B',X,^,fi") with |IT"| < 96n + 6 |ITinit|, that is, the state Vq is small. 
Moreover, the intermediate steps of the pair compression algorithm are performed 
within A. 


Proof. Recall that the NFA A is trim. Hence, there is a path 

h-n 


Vo 


■Vp-i 


Vp 


from an initial state with the appropriate p, to Vp. Let Vi = {Wi, Bi, Xi,9i, p,i). 
We perform the following marking process. The idea is that we wish to mark all 
constants in the Wi which could possibly give rise to a factor ad in W. These 
factors can arise in exactly two ways: the initial equation may be unreduced to 
start with, or from a substitution (for example, we may have aX or YZ factors of 
the initial equation and we pop X —> aX or Y —> Ya, Z —> aZ). 

(1) In Wq = Winit we mark all letters (both constants and variables). 

(2) If Vi-i —^ 14 is a substitution transition, Wi = r(Wi_i) and the positions 
with constants in W-i are mapped to positions with constants in Wi. We 
mark constants in Wi that come from marked constants in W4_i, and if 
t{X) G aT* and X is marked in Wi-i, we mark the newly added a on the 
left of the variable X in Wi, and leave X unmarked. If t{Y) = Y and Y 
is marked in 114- 1 , we leave Y marked in Wi. Note that in this way each 
marked variable gives rise to exactly one marked letter. 

(3) If 14-1 Vi is a compression transition, then we have h{Wi) = I14_i. 
Mark a constant c in Wi if it is mapped by h to an occurrence of a factor 
containing a marked position in 144- 1 . 
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Note that since the pair compression procedure is always preceded by the prepro¬ 
cessing step above, we can assume that every variable X in Winit has been replaced 
by aX where a is marked, so in Vp the word W contains at most |Winit| marked 
constants and no marked variables. 

When we run the pair compression procedure on W we cannot compress pairs 
aa, or pairs containing variables. If we now mark all variables present in W, then 
we are allowed to compress any pairs of letters in W that are unmarked. After 
marking the variables we have at most 2 |Winit| marked letters in W. 

Let us factor the word W € {BU X)* as W = xqUiXi ■ ■ ■ uexg, where £ is chosen 
to be maximal that for all 1 < t ^ we have: 

(1) Xi G (BUX)*. 

(2) Ui G {B \ {#})* and Ui doesn’t contain any marked position. 

(3) The length of each m is exactly 3. 

The factorization enjoys the following properties. 

• Since all #’s are marked, we have xq ^ 1 xg. Some other Xi can be 
empty. 

• Since we require jiti | = 3 it may be that Xi contains for each marked position 
also two unmarked position. The exception is the first position in xg- 
Hence, we obtain 

^ <3(2|Wi„it|)-2<6|Wi„it|. 

o<i<e 

• Since |TT| — 6 |Winit| > 96n, the previous line yields 

i > 32n. 

Consider the word W' which was obtained via the substitution transitions, but 
before the compression of factors ab G LR into single letters. The increase in length, 
which is \W'\ — |IT|, comes from the substitution transitions X i-A- bX,X Xb 
with X G C, so the length goes up by at most 8n. Note that the Ui factors do not 
change, only the Xi factors do. Hence W has the factorization W' = yguiyi ■ ■ ■ uiyg 
with yi G {B LI X)* and 

(16) \yo-"yi\<\xo---Xi\+8n. 

Finally, let W" be the word obtained after pair compression has been performed. 
The word W" is the compression of some word ygViyi ■ ■ -ymyrn where each Vi is 
the result of the compression restricted to Ui. 

Each Ui can be written as Ui = abc with a,b,c G B. Since W did not contain 
any proper factor (P with d G B hy hypotheses (and as we have performed block 
compression first), we know a p b p c. Moreover, we cannot have d = b or c = b 
because in every occurrence of bb in W at least one position is marked. 

Assume for a moment that membership to L or i? was defined uniformly at 
random. That is for each ^ p a G B the probability for ad G LR is ^ and 
independent of the other events “bb G LR” for a p b. 

There are two possibilities: either b G L or b G R. In the first case, either 
c G R or c G L, and in the second case either a G L or a G R. Each event 
be G LR,bc G LL,ab G LR,ab G RR has probability so with probability ^ 
one pair in the factor Ui is compressed: thus the expected length of a factor Vi is 
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E[|wi|] = | + | = |. By linearity of expectation, we obtain 

(17) 

Thus if the partition (L, R) were chosen at random, we expect the length of the 
word ui - ■ - ut, to decrease from M to or less, that is, we expect at least factors 
Ui are compressed (each Vi has length either 2 or 3). But in Remark [571 we made 
the best choice of compressing a maximal number of pairs in W'. This means at 
least factors of W are compressed. Hence, for the actual pair compression, we 
may estimate the length of W" as follows. 

\W'\ < \xq ■ ■ ■ xi\ + Sn + since If factors are compressed 

= |lH|+8n—I since |1H| = |a;o - 0:^1 + 3£ 

< |1T| — 8n since £ > 32n 

< 96n + 6 iWinitI since |IH| < 104n + 6 |Winit|- 

Since \W"\ < 96n + 6 |Winit|, the last state Vq = {W", B', X, 0, /r") is small. □ 

A linear bound on the size of C is evident from the proofs above and an explicit 
bound is given next. Thus, we have shown Lemma [Ml 

3.10.4. The size of the extended alphabet C: the choice of k. The longest equation 
W we needed to establish completeness occurs during block compression, where we 
found that \W\ < 168n + 6|Winit| (HU)- Combining this with |Winit| < 6n ([S]) we 
obtain 

(18) |1T| < 168n + 36n = 204n. 

The largest alphabet we ever needed during block and pair compression was less 
than 

3 • (|A+| + \W\) < 3 • (n + 204n) = 3 • 205n = 615n. 

Thus, we can choose k such that 

(19) \C\ = K ■ n = 615n. 

3.10.5. Finishing the proof of Theorem^ in the monoid case. Lemma 1341 implies 
ProDOsition bv the reduction in Subsection 13.10.21 This in turn oroves m 
in Theorem m in the monoid case M(A) = A*. Clearly, {{h{ci),..., h{cm)) G 
C* X ■■■ X C* \ h G L{A)} is empty if and only if L{A) =0. It remains to show 
that A contains a directed cycle if and only if {U,V) has infinitely many solutions. 
If there is no cycle, then L{A) is finite and {U, V) can have only finitely many 
solutions. The converse has been shown in Corollary [26l 

4. Proof of Theorem [H in the group case: M(A) = F(A+) 

The proof is a reduction to the monoid case. Recall that A = A± U {#}, F is the 
subset of reduced words in A’^, and tt : A* ^ F(^+) is Hie canonical projection. 

We start with an equation ({7, V) in the free group F(A-|_), where U,V G (AUA)*, 
X = and solutions are A-morphisms a : (AUX)* —>• F 

such that Tra{U) = 7rcr(P). In a first phase we transform the equation {U,V) into 
a system of triangular equations, where triangular means 1 < \UV\ < 3. We may 
assume UV yf 1. If \UV\ < 3, then the equation is already triangular. Hence, let 
us assume \UV\ > 4. Since we are in the group case we may also assume \V\ = 1. 
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Write U = xi ■ ■ ■ Xp with Xi AV} X and p > 3. Next, we introduce a new variable 
X and replace xi ■ ■ ■ Xp = V hy the system 

Xi ■ ■ ■ Xp-i = X f\ Xxp = V. 

We iterate until the system is triangular. The procedure introduces more variables, 
but it does not change the set of solutions. More formally, if {{Ui, Vi) \ 1 < i < t} 
is the system of triangular equations we obtained above, then 

{(u(Xi),..., a{X^)) e F X • • • X F I 7ra{U) = na{V)} 

= {{a{Xi ),..., a{Xm)) eFx---xF|Vl<j<t: T:a{Ui) = Tra{Vi).} 

The crucial step in our reduction is to switch from solutions over free groups to 
solutions over free monoids with involution. We do this using the following lemma, 
whose geometric interpretation is simply that the Cayley graph of a free group (over 
standard generators) is a tree. 

Lemma 40. Let x,y,z be reduced words in A^. Then xy = z holds in the group 
F(A_|_) (i.e. Tr{xy) = 7r(z)) if and only if there are reduced words P,Q,R in A’^ 
such that X = PR, y = RQ, and z = PQ holds in the free monoid A'^. 



Figure 3. Paths corresponding to geodesic words for x,y,z with 
xy = z \Ti the Cayley graph of F(A_|_) with standard generators, 
as in Lemma SSI The geodesics to vertices x and z split after an 
initial path labeled by P. 


Proof. The direction from right to left is trivial, whether or not P, Q, R are reduced. 
For the other direction there are two cases. First, xy is a reduced word. Then we 
can choose P = x, R = 1, Q = y, and we are done. Second, we have x = x'a 
and y = ay' for some letter a G A±, so x'y' = z' holds in the group F(Al+). By 
induction, there are reduced words P,Q,R' with x' = PR',y' = R'Q,z = PQ in 
Aj-. We can define R = R'a, which is reduced due to the equation x = x'a = PR'a 
and the fact that x is reduced. The result is now immediate. □ 

The consequence of Lemma SSI is that with the help of fresh variables P,Q,R we 
can substitute every equation xy = z with x,y,z G {1} U A± U in F(A+) by the 
following three word equations to be solved over a free monoid with involution: 

( 20 ) 


X = PR. 


y = RQ 


z = PQ. 
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More precisely, in the third phase of the transformation we replace each Ui = Vi, 
where Ui = Xiyi and Vi = Zi, by the three equations 

( 21 ) Xi — PiRi, yi — RiQi, ^i — PiQi- 

Thus, for s = 3f < 3 \ UV\ we obtain a new system of triangular word equations 
{{UI, V-) I 1 < i < s} such that 

(22) {{a{Xi),..., a{X^)) e F x • • • x F | Tra{U) = Tra{V)} 

(23) ={(a(Xi),...,a(X„))eFx.-.xF|Vl<z<s: a(C/') = a(F/).} 

Note that the morphism tt is not present in (1^51) . since (1^51) refers to a system of 
equations over a free monoid with involution. 

The final step is to encode the system {{U^, VI) \ 1 < * < s} into a single word 
equation {U", V) over the free monoid A*, by defining 

U" = U[#---#U{ 

V" = vi4^---Ws- 

Thus we have deterministically reduced the equation {U, V) to the equation (17", V"), 
where 

\U"V''\ < 15\UV\ 

since each U{V{ has length at most 3 and we have inserted 2s — 2 copies of the 
letter This finishes the proof of Theorem [3] for the group case. 

Remark 41. Since the length of the word equation obtained from a free group 
equation of length n is at most 15n, an upper bound for the size of the alphabet C 
in the statement of Theorem S] in the free group case is 615 • 15n = 9225n. 

5. Example of preprocessing, block and pair compression procedures 

We conclude with a demonstration of the procedures described in Subsection l3.10l 
with a simple example. Suppose we have a single equation {U, V) in a free monoid 
with involution with 

U = XaYbaXP and V = bYb^ZQ. 

For simplicity we have chosen an equation with no involuted letters. Suppose also 
that we know a solution 

(t(X) = b^, cr(y) = b'^a, cr(Z) = bab, (j{P) = ab^a, a{Q) = ab^ab^a. 

We depict the situation as follows: 

X Y X p 

bbbbbabbbbababbbbbabbb a. 

' -V-" -V--v-' 

Y Z Q 

For simplicity, we will ignore the rest of the word Winitj and focus just on the 
factor U 

We first follow the preprocessing step on page [221 In this case we pop the first 
and last letter of each variable, to obtain: 
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Next we enter block compression. In step (1) we compute Aa = 9,Ab = {4,5}. 
Note that 3 ^ Ab since the factor is completely inside P and Q so is not visible. 
The block compression process will not touch this factor. We also compute Xa = % 
and Xb = {X,Y}. Note that P ^ Xb since it is preceded by a in W. 

Step (2) introduces the fresh letters Ch, C 4 ,b, cs^b, and renames the letters b that 
are part of a visible block of length at least 2 as c;,: 


A Y X p 

Cb Cb Cb Cb Cb a Cb Cb Cb Cb a b a Cb Cb Cb Cb Cb a b^ a. 

z 


Y 


Q 


In step (3) we split the variables X — 5 - X'X, Y —^ Y'Y, then remove X, Y 
since cr{X) = 1 = a(Y): 

X' y' X' 

Cb Cb Cb Cb Cb a Cb Cb Cb Cb ci b a Cb Cb Cb Cb Cb ci b^ a. 

Y' z Q 


Note that Q does not belong to Xb, so it does not split even though a{Q) starts 
with Cb- 

Step (4) renames one of the Cb in each block in both W and a{W): 


X' 


Y' 


X' 


Cb,b Cb Cb Cb Cb d C4^b Cb Cb Cb d b d C^^b Cb Cb Cb Cb d b d. 
Y’ Z Q 


We now enter the loop in step (5). We write c = Cb, c\ = c\y. 


X' Y' X' p 

cbccccdciccc d b a Cb c c c c a b^ d. 

Y' Z Q 

Since 0{X') = 0{Y') = c we pop each to make the number of c letters in each 
(t(X) even: 

cbccccdc 4 ccc a b d c c c c d b^ d. 

Y’ Z Q 


Note that we have used the fact that X',Y' commute with c in the partially com¬ 
mutative monoid. 

We are now at part (d) of step ( 6 ). Since C 4 C^ is a factor where the number of c 
letters is odd, we follow the compression transition h(c 4 ) = C 4 C to obtain: 


c 5 ccccac 4 cc d b a Cb c c c c a b^ d. 
Y' z Q 


We now have all blocks of c inside variables and in W of even length, so we can 
finally follow the block compression transition h(c) = cc to reduce the number of c 
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letters by half: 


X' 


C5 




X' 


o C 5 c cab' 


a. 


Q 


Since there are still c letters remaining in a{W) we repeat the loop, and after 
two more iterations of the loop we obtain: 


C5 CL C4 




a C 5 a 


a. 


Q 


At this point we have removed all letters c;, so the loop terminates. We reduce 
the alphabet by removing c;,, and remove the types. Note that we keep each Ca ,6 
since each letter represents a different length block of 6 ’s, and therefore they are all 
different. Let us rename cs^b = d and C 4 ^b = e. So the equation is now: 


p 

d a e a badabbb a. 
z Q 

As promised, W contains no proper factors b^ for any b G B, so we can start pair 
compression. 

Suppose we choose a partition of B\{^} as = {a, 5, d, e} and B- = {a, &, d,e} 
(we suppose this choice is maximal according to Remark [371). In step (1) of pair 
compression we introduce fresh letters Cba, Cda, Cea, then in step ( 2 ) we create the 
list C = {Z,P,Q}. (We will continue to ignore involutions, and focus just on a 
factor of W containing no involuted letters or variables). We perform uncrossing 
by popping a from Z and removing Z, and since we follow P —bP then we also 
follow P —> Pb, and similarly for Q, leading to: 

p 

daeabadabbb a. 


Q 

In step (3) we follow compression transitions h{cba) = ba,h{cda) = da,h{cea) = 
ea to obtain: 

Cda Cea Cba Cda b b Cba • 

Q 

This completes one round of the process. We then return to the preprocessing 
step, which gives: 


^da ^ea ^ba 


Cda b 



b 


^ba 5 


and then block compression would produce: 


^da ^ea ^ba ^da ^ 2,6 ^ba- 
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